Technique for automatic sentence level alignment of long speech and transcripts

نویسندگان

Imran Ahmed

Sunil Kumar Kopparapu

چکیده

A frugal approach to construct speech corpora, specially for resource deficient languages, is to exploit collections of speech and corresponding text data available in audio books, news, lectures. However, using these resources for building speech corpora require an alignment of the long duration speech data with the accompanying text data. Existing techniques for automatic speech-text alignment of long audio files assume availability of a basic speech recognition engine and hence cannot be directly used for resource deficient languages. In this paper, we propose a novel technique for sentence level alignment of long speechtext data by exploiting the syllable information in speech and text data. The proposed technique does not depend on the availability of any speech recognition models and hence can be used for resource deficient languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Automatic Alignment of Video and Text for Search/Names and Faces

We propose a novel way of aligning the audio/video and text streams, which is faster than conventional speech recognition, and requires no supervision. Multimedia of this form includes news broadcast with summaries, parliament proceedings and court trials with transcripts, etc. In addition to applications to video search using the text based indexing, we also show how we can annotate the video ...

متن کامل

Automatic building of synthetic voices from large multi-paragraph speech databases

Large multi paragraph speech databases encapsulate prosodic and contextual information beyond the sentence level which could be exploited to build natural sounding voices. This paper discusses our efforts on automatic building of synthetic voices from large multi-paragraph speech databases. We show that the primary issue of segmentation of large speech file could be addressed with modifications...

متن کامل

SParseval: Evaluation Metrics for Parsing Speech

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automa...

متن کامل

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

بررسی شاخص های کیفیت گفتار در کودکان فارسی زبان طبیعی 5-4 ساله در شهرهای سمنان، بیرجند و تنکابن، سال 1383

Background and purpose: We can examine the language abilities of a person through five parameters of speech quality including speech fluency, speech complexity, speech exactness, speech rate and lexical accessibility. These parameters are examined by the secondary parameters including mean length of utterance (MLÜ), mean length of five long utterances, mean number of verb in sentence, mean nu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Technique for automatic sentence level alignment of long speech and transcripts

نویسندگان

چکیده

منابع مشابه

Fast Automatic Alignment of Video and Text for Search/Names and Faces

Automatic building of synthetic voices from large multi-paragraph speech databases

SParseval: Evaluation Metrics for Parsing Speech

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

بررسی شاخص های کیفیت گفتار در کودکان فارسی زبان طبیعی 5-4 ساله در شهرهای سمنان، بیرجند و تنکابن، سال 1383

عنوان ژورنال:

اشتراک گذاری